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^fiapter i: jntreductien 


Chapter 1: Introduction 


Pattern recognition techniques are associated a symbolic identity with the 
image of the pattern. In this work we will analyze different neural network 
methods in pattern recognition. This problem of replication of patterns by 
machines (computers) involves the machine printed patterns. The pattern 
recognition is better known as optical pattern recognition. Since it deals with 
recognition of optically processed patterns rather then magnetically processed 
[1] ones. Though the origin of character recognition can be found as early as 
1870, it first appeared as an aid to the visually handicapped and the first 
successful attempt was made by the Russian scientist Tyurin in 1900 [2]. The 
modem version of optically character recognition (OCR) appeared in the 
middle of the 1940’s with the development of digital computers. Thenceforth 
it was realized as a data processing approach with application to the business 
world. The principal motivation for the development of OCR systems is the 
need to cope with the enormous flood of paper such as bank cheques, 
commercial forms, govt, records, credit card imprints and mail sorting 
generated by the expending technological society. OCR machines have been 
commercially available science the middle of 1950’s. Science then extensive 
research has been carried out and a large no. of technical papers and reports 
has been published by the various researchers in the area of pattern 
recognition. Several books have been published on optical character [3-11] 
Russian scientists Tyurin recognition. Also special issues and reports on the 
topic have repeatedly appeared in the proceedings of the international joint 
conferences on pattern recognition and of the international systems, Man and 
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cybernetics conferences. Research works also appear in the various other 
conferences such as British conferences on pattern recognition, and 
Scandinavian conferences on image analysis. 

1.1 Application of Pattern Technology 

Pattern recognition has many practical applications. 

• Use by blind people- as reading aid using photo sensor and tactile 
simulators, and as a sensory aid with sound output [12-13]. 

• Use as a telecommunication aid for deaf [14]. 

• Use in postal department - for postal address reading as a reader for 
handwritten and printed postal codes [15-17]. 

• Use in publishing industry [18], and as a reader for data 
communication terminal [19]. 

• For direct processing of documents- as a multipurpose document 
reader for large scale data processing, as a microfilm reader data input 
system, for high speed data entry, for changing graphics in to a 
computer readable form, as electronic page reader to handle large 
volumes of mail [20-21]. 

• For use in customer billing as in telephone exchange billing system 
[22], order data logging [23], automatic fingerprint identification 
[24], as an automatic inspection system-I.C. mask inspection and 
defect detection in microcircuits [25], as a credit card scanner in 
credits personal identification systems. 

• For business application- financial business application like cheques 
sorting strategy optimization [26, 27]. 

• For digital barcode reading, and as a handwritten analyzers- for 
automatic writer recognition and signature verification [28, 29]. 
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• For shorthand transcription, and electronic package industries and 
reading characters stamped on metallic parts. 

A neural network is a processing device, whose design was inspired by 
the design and functioning of human brain and their components. There is no 
idle memory containing data and programmed, but each neuron is 
programmed and continuously active. 

Neural network has many applications. The most likely applications for 
the neural networks are (1) Classification (2) Association and (3) Reasoning. 
One of the applications of neural networks is in the field of pattern 
recognition. Pattern recognition is a branch of artificial intelligence 
concerned with the classification or description of observations. Its aim is to 
classify patterns based on either a priori knowledge or on the features 
extracted firom the patterns. For pattern recognition there should be 
techniques to describe a large no of similar structures of the same category 
and allowing distinct description among different category. It is expected 
that if the features people use to recognize pattern are properly giving to the 
pattern recognition algorithms of neural network, then it should perform as 
human. 

Pattern recognition is the recognition or separation of one particular 
sequence of bits or pattern fi’om other such patterns. Pattern recognition [PR] 
applications have been varied, and so also the associated data structures and 
processing paradigms [30] . In the course of time, four significant approaches 
to PR have evolved. These are: 

Statistical pattern recognition: Here, the problem is posed as one of 
composite hypothesis testing, each hypothesis pertaining to the premise, of 
the datum having originated firom a particular class; or as one of regression 
fi'om the space of measurements to the space of classes. The statistical 
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methods for solving the same involve the computation other class 
conditional probability densities, which remains the main hurdle in this 
approach. The statistical approach is one of the oldest, and still widely used 
[31]. 

Syntactic pattern recognition: In syntactic pattern recognition, each pattern 
is assumed to be composed of sub-pattern or primitives strung together in 
accordance with the generation rules of a grammar characteristic of the 
associated class. Class identifications accomplished by way of parsing 
operations using automata corresponding to the various grammars [32, 33]. 
Parser design and grammatical inference are two difficult issues associated 
with this approach to PR and are responsible for its somewhat limited 
applicability. 

Knowledge-based pattern recognition: This approach to PR [34] is 
evolved from advances in rule-based system in artificial intelligence (AI). 
Each rule is in form of a clause that reflects evidence about the presence of a 
particular class. The sub-problems spawned by the methodology are:- 

1. How the rule-based may be constructed, and 

2. What mechanism might be used to integrate the evidence yielded by 
the invoked rules? 

Neural Pattern Recognition: Artificial Neural Network (ANN) provides an 
emerging paradigm in pattern recognition. The field of ANN encompasses a 
large variety of models [35], all of which have two important characteristics; 

1. They are composed of a large number of structurally and fimctionally 
similar units called neurons usually connected various configurations 

'■'[ by weighted links. / 

2. The ANN’ s model parameters are derived from supplied I/O paired 
data sets by an estimation process called training. 
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ANN’S had been developed as a class of trainable model-free estimators 
in an attempt to emulate the working of naturally intelligent systems (e.g. 
The Human brain). 

A neural network is a processing device, whose design was inspired by 
the design and functioning of the human brain and their components. There 
is no idle memory containing data and programs, but each neuron is 
preprogrammed and continuously active. 

Neural Networks have many applications. The most likely applications for 
neural network are classification, association and reasoning, one of the 
applications of neural networks in the field of pattern recognition. Pattern 
recognition is a branch of artificial intelligence concerned with the 
classification or description of observations. 

Its aim is to classify patterned based on either a prior knowledge or on 
the features extracted from the pattern recognition, there should be 
techniques to describe a large number of similar structures of the same 
category and allowing distinct descriptions among different category [36]. It 
is expected that if the future people use to recognize pattern are properly 
given to the pattern recognition algorithms of neural network, then it should 
perform as human 

Methodology: There are many neural network algorithms for the pattern 

recognition. Various algorithms differ in their learning mechanism. Learning 
can be either supervised or unsupervised. In supervised learning, the training 
set contains both inputs and required responses. After training the network 
we should get the response equal to target response. Unsupervised 
classification learning is based on clustering of input data. There is no prior 
information about input’s membership in a particular class. The 
characteristics of the patterns and a history of training are used to assist the 
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network in defining classes. This unsupervised classification is called 
clustering. 

The characteristics of the neurons and initial weights are specified based 
upon the training method of the network. The pattern sets is applied to the 
network during the training. The pattern to be recognized are in the form of 
vector whose elements is obtained from a pattern grid. The elements are 
either 0 and 1 or -1 and 1. In some of the algorithms, weights are calculated 
from the pattern presented to the network and in some algorithms weights 
are initialized. The network acquires the knowledge from the environment. 
The network stores the patterns presented during the training in another way 
it extracts the features of pattern. As a result of this, the information can be 
retrieved later. 

1.2 The Neural Network Algorithms Used In Pattern Recognition 

We use different neural network algorithms in pattern recognition as 
follows: 

1. Bidirectional Associative Memory (BAM) 

2. Hopefield Autoassociative Memory 

3. Feature Recognition Neural Network (FRNN) 

4. Hamming Network And MAXNET 

5. Backpropagation Algorithms. 

6. Quickprop 

7. Adaptive Resonance Theory (ART 1) 

8. Kohonen Self Organizing Map 

9. Neocognitron 

The Bidirectional associative memory does heteroassociative processing 
in which^i association between pattern pairs is stored. Hopefield network is 
based on autoassociation. It associate the input pattern with the closest 



stored pattern. ARTl and Kohonen network acts as classifier, which 
classifies a set of patterns, and the network recalls the information regarding 
class membership of the stored patterns when any of them is presented to the 
network later. Hamming network does classification based on the hamming 
distance between the input and stored patterns. Neocognitron applies 
concept of competitive learning. 

1.3Problem Statement 


The aim of the thesis is that neural network has demonstrated its capability 
for solving complex pattern recognition problems. Commonly solved 
problems of pattern have limited scope. Single neural network architecture 
can recognize only few patterns. Relative performance of various neural 
network algorithms has not been reported in the literature. 

The thesis discusses on various neural network algorithms with their 
implementation details for solving pattern recognition problems. The relative 
performance evaluation of these algorithms has been carried out. The 
comparisons of algorithms have been performed based on following criteria: 

(1) Noise in weights 

(2) Noise in inputs 

(3) Loss of cormections 

(4) Missing information and adding information. 

1.40rganization of The Thesis 


The brief introduction of the neural network for pattern recoguition is given 
in chapter 1 . Chapter/contains the fundamental of neural networks. We have 
presented different detailed neural network algorithms used in the pattern 


recognition. In chapter 4, the performance of various algorithms under 


different criterion have been studied and compared. 


7 



The conclusions are given in chapter 5. This chapter includes the topic for 
future in this field. 
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Chapter 2: Fundamentals of Artificial Neural 

Networks 


Artificial neural networks have developed in wide variety of configurations. 
Despite of this apparent diversity, network paradigm has a great deal in 
common. In this chapter, recurring themes are briefly identified and 
discussed so they will be familiar. There are no of different answers possible 

•v' f, 

to the question of how to define neural network. At one extreme, answer 
could be that neural networks are simply a class of mathematical algorithms, 
since a network can be regarded essentially as a graphic notation for a large 
class of algorithms. Such algorithms produce solutions to a no, of specific 

. S'' , 

problems. At the other end, the reply may be that these are synthetic 
networks that emulate the biological neural networks found in living 
organisms. In light of today’s limited knowledge of biological neural 
networks and organisms, the more possible answer seems to be closer to the 
algorithmic one. 

2.1 Biological Neurons and Their Artificial Models 

A human brain consists of approximately lO’^ computing elements 
called neurons. They communicate through a connection network of axons 
and synapses having a density of approximately synapses per neuron. 
Our hypothesis regarding the modeling of the natural nervous system is that 
neurons communicate with each other by means of electrical impulses [37- 
39]. The neurons operate in a chemical environment that is even more 
important in terms of actual brain behavior. We thus can consider the brain 
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to be a densely connected electrical switching network conditioned largely 
by the biochemical processes. The vast neural network has an elaborate 
structure with very complex inter connections. The input to the network is 
provided by sensory receptors. Receptors deliver stimuli both from with in 
the body, as well as from sense organs when the stimuli originate in the 
external world. The stimuli are in the form of electrical impulses that 
convey the information in the network of neurons. As a result of information 
processing in the central nervous system, the effectors are controlled and 
give human responses in the form of diverse actions. 

We thus have a three stage systems, consisting of receptors, neural network, 
end effectors, in control of the organism and its actions. 



Fig: 2.1 Information Flow in Nervous System 

A lucid, although rather approximate idea, about the information links in the 
nervous system is shown in fig2.1. 
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As we can see from the fig., the information is processed, evaluated, and 
compared with the stored information in the central nervous system. When 
necessary, commands are generated there and transmitted to the motor 
organs. Notice that motor organs are monitored in the central nervous 
system by feedback links that verify their actions. Both internal and external 
feedbacks control the implementation of commands. 

2.1.1 Biological Neuron 

The elementary nerve cell, called a neuron, is the fundamental building 
block of the biological neural network. Its schematic diagram has shown if 
fig 2.2. A typical cell has three major regions: the cell body, which is also 
called the soma, and axon, and the dendrites. Dendrites form a dendrite tree, 
which is a very fine bush of thin fibers around the neuron’s body. Dendrites 
receive the information from neurons through axons-long fibers that serve as 
transmission lines. An axon is a long cylindrical connection that carries 
impulses from the neuron. The end part of an axon splits in to a fine 
arborization. Each branch of it terminates in a small end bulb all most 
touching the dendrites of neighboring neurons. The axon- dendrite contact 
organ is called a synapse. The synapse is where the neuron introduces its 
signal to neighboring neuron. The signal reaching a synapse and received by 
dendrites are electrical impulses. 
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Neuron 




Nerve Impulse 
Train 


Fig: 2.2 Schematic Diagrams of a Neuron and a Sample of Pulse Train 

2.1.2 Artificial Neuron 

The artificial neuron was designed to mimic the first-order 
characteristics of the biological neuron. In essence, a set of inputs is applied, 
each representing die output of another neuron. Each input is multiplied by a 
corresponding weight, analogous to a 
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Fig. 2.3 Artificial Neuron 

Synaptic strength and all of the weighted inputs are then summed to 
determine the activation level of the neuron. Fig. 2.3 shows a model that 
implements this idea. Despite the diversity of network paradigm, nearly all 
are based upon this configuration. Here, a set input leveled Xi, X 2 ... Xn is 
applied to artificial neuron. These inputs collectively referred to as the 
vector X, correspond to the signal in to the synapses of a biological neuron. 
Each signal is multiplied by an associated weight Wi, W 2 ... Wn, before it is 
applied to the summation block, labeled E. Each weight corresponds to the 
“Strength” of a single biological synaptic connection. 

The summation block, corresponding roughly to the biological cell body, 
adds all of the weighted inputs algebraically, producing an output that we 
call NET. This may be compactly stated in vector notation as follows, 

NET = XW 
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Activation Function 

The NET signal is usually further processed by an activation function F to 
produce the neuron’s output signal, OUT. This may be simple linear 
function, 

OUT = K (NET) 

Where K is a constant, a threshold function, 

OUT=l ifNET>T 

OUT=0 otherwise 

Where T is constant threshold value, or a function that more accurately 
simulates the nonlinear transfer characteristic of the biological neuron and 
permits more general network functions. 



Fig. 2.4: Artificial Neuron with Activation Function 

In fig. 2.4 the block labeled F accepts the NET output and produces the 
signal labeled OUT. If the F processing block compresses range of NET, so 
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that OUT never exceeds some low limits regardless of the values of NET, F 
is called a squashing function. The squashing function is often chosen to be 
the logistic function or “Sigmoid” meaning (S-shaped), This function is 
expressed mathematically as F{x) = l/(l + , 

Thus 

OUT = 1/(1 + = F(i^) 

By analogy to analog in electronic systems, we may think of the activation 
function as defining a nonlinear gain for the artificial neuron. This gain is 
calculated by finding the ratio of the change in OUT to a small change in 
NET. Thus, gain is the slope of the curve as a specific excitation label. 

Small input signal requires high gain through the network if they are to 
produce usable output; however, a large number of cascaded high- gain 
stages can saturate the output with the amplified noise (random variations) 
that is present in any realizable network. Also, large input signal will 
saturate high- gain stages, again eliminating any usable output. The central 
high-gain region of the logistic function solves the problem of processing 
small signals, while its regions of decreasing gain at positive and negative 
extremes are appropriate for large excitations. 

Another commonly used activation function is the hyperbolic tangent. It 
is similar in shape to the logistic function and is often used by biologists as a 
mathematical model of nerve cell activation. Used as an artificial neural 
network activation function it is expressed as follows: 

UC/r = tanh(x) = -^J^^ 

e +e 
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2.1.3 Single Layer Artificial Neural Networks 

Although a single neuron can perform a certain simple pattern detection 
functions, the power of neural computation comes from connecting neurons 
in to networks. The simplest network is a group of neurons arranged in a 
layer as shown in the fig 2.5. Note that the circular notes on the left serve 
only to distribute the inputs: they perform no computation and hence will not 
be considered to constitute a layer. For this reason, they are shown as circles 
to distinguish them from the computing neurons, which are shown as 
squares. The set of inputs X has each of its elements connected to each 
artificial neuron through a separate weight. Early artificial neural networks 
were no more complex than this. Each neuron simply output a weighted sum 
of the inputs to the network. Actual artificial and biological networks may 
have many of the connections deleted, but full connectivity is shown for 
reasons of generality. 



Fig. 2.5; Single Layer Neural Network 
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It is convenient to consider the weight to be elements of a matrix W. The 
dimensions of the matrix are m rows by n columns, where m is the no. of 
inputs and n the no. of neurons. 

2.1.4 Multi Layer Artificial Neural Networks 

Multilayer networks may be formed by simply cascading a group of 
single layers; the output of one layer provides the input to the subsequent 
layer. 



Fig 2.6: Two layer neural network 
Non-linear Activation Function 

Multilayer networks provide no increase in computational power over a 
single layer network unless there is a nonlinear activation function between 
layers. Calculating the output of a layer consists of multiplying the input 
vector by the first weight matrix, and than (if there is no nonlinear activation 
function) multiplying the resulting vector by the second weight mahrix. This 
may be expressed as(ZlF,)^2' Since matrix multiplication is associative, the 
term may be regrouped and written: Z()T, 1^2 ) 
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This shows that a two layer linear network is exactly equivalent to a 
single layer having a weight matrix equal to the product of the two weight 
matrices. Hence, any multilayer linear network can be replaced by an 
equivalent one-layer network. We point out that single layer networks are 
severely limited in their computational capability; thus, the non-linear 
activation functions are vital to the expansion of the network’s capability 
beyond that of the single layer network. 

Recurrent networks 

The networks considered up to this point have no feedback connections, 
that is, connections through weights extending from the outputs of a layer to 
the inputs of the same or previous layers. This special class called non- 
recurrent or feed forward networks. 

2.2 Terminologies, Notation, and Representation of Artificial Neural 
Networks 

Many authors avoid the term “neuron” when referring to the artificial 
neuron, recognizing that it is a crude approximation of its biological model. 
The terms “neuron”, “cell”, and “unit” interchangeably as shorthand for 
“artificial neuron”, as these words are self-explanatory. 

Learning algorifrims, like artificial neural networks in general can be 
presented in either differential -equation or difference- equation form. The 
differential equation representation assumes that the processes are 
continuous, operating much like a large analog network. Viewing the 
biological system at a microscopic level, this is not true; the activation level 
of a biological neuron is determined by the average rate at which it emits 
discrete action potential pulses down its axons. This average rate is 
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commonly treated as an analog quantity but it is important to remember the 
underlined reality. 

If one wishes to simulate artificial neural networks on an analog 
computer, differential equation representations are highly desirable. 
However, most work today is being done on digital computers, making the 
difference equation form most appropriate, as these equations can be 
converted easily in to computer programs. For this reason, the difference 
equation representation is used very much. 

2.3 Training of Artificial Neural Networks 

Of all of the interesting characteristics of artificial neural networks, none 
captures the imaginations like their ability to learn. Their training shows so 
many parallel to the intellectual development of human beings that it may 
seem that we have achieved a fundamental understanding of this process. 
Objective of Training 

A network is trained so that application of a set of inputs produces the 
desired set of outputs. Each such input set is referred to as a vector. Training 
is accomplished by sequentially applying input vectors, while adjusting 
networks weights according to a predetermined procedure. During training, 
the networks weights gradually converge to values such that each input 
vector produces the desired output vector. 

Supervised Training 

Training algorithms are categorized as supervised and unsupervised. 
Supervised training requires the pairing of each input vector with a target 
vector representing the desired output; together these are called a training 
pair. Usually a network is trained over a no. of such training pairs. An input 
vector is applied, the output of the network is calculated and compared to the 
corresponding target vector, and the difference (error) is fed back through 
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the network and weights are changed according to an algorithm that tends to 
minimize the error. 

The vectors of the training set are applied sequentially, and errors are 
calculated and weights adjusted for each vector, until the error for the entire 
training set is at an acceptably low level 



(a) (b) 


Fig: 2.7 Block Diagram for Explanation of Basic Learning Modes: 

(A) Supervised Training (B) Unsupervised Training 

In supervised learning we assume that at each instant of time when the input 
is applied, the desired response d of the system is provided by the teacher. 
This is illustrated in fig(2.7a) .The distance p[d,o] between the actual and the 
desired response serves as an error measure and is used to correct network 
parameters externally. Since we assume adjustable weights, the teacher may 
implement a reward-and -punishment scheme to adapt the network’s weight 
matrix W. For instance, in learning classifications of input patterns or 
situations with known responses, the error can be used to modify weights so 
that the error decreases. This mode of learning is very pervasive. Also, it is 
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used in many situations of natural learning. A set of input and output 
patterns called a training set is required for this learning mode. 

Typically, supervised learning rewards accurate classification or 
associations and punishes those which yield inaccurate responses. The 
teacher estimates the negative error gradient direction and reduces the error 
accordingly. In many situations, the inputs, outputs and the computed 
gradient are deterministic, however, the minimization of error proceed over 
all its random realizations. As a result, most supervised learning algorithms 
reduce to stochastic minimization of error in multi-dimensional weight 
space. 

Figure 2.7(b) shoes the block diagram of unsupervised learning. In 
learning without supervision, the desired response is not known; thus, 
explicit error information cannot be used to improve network behavior. 
Since no information is available as to correctness or incorrectness of 
responses, learning must somehow be accomplished based on observations 
of responses to inputs that we have marginal or no knowledge about. 

Unsupervised Training 

Despite many application successes, supervised training has been criticized 
as being biologically implausible; it is difficult to conceive of a training 
mechanism in the brain that compares desired and actual outputs, feeding 
processed corrections back through the network. If this were the brain’s 
mechanism, where do the desired out puts patterns come from? How could 
the brain of an infant accomplish the self-organization that has been proven 
to exist in early development? Unsupervised training is a far more plausible 
model of learning in the biological system. Developed by Kohonen 
(1984)[40] and many others, it requires no target vectors for the outputs, and 
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hence, no comparisons to predetermined ideal responses. The training set 
consists solely of input vectors. The training algorithms modifies network 
weights to produce output vectors that are consistent; that is, both 
applications of one of the training vectors or applications of a vector that is 
sufficiently similar to it will produce the same patterns of outputs. The 
training process, therefore, expects the statistical properties of the training 
set and group’s similar vector in to classes. Applying a vector from a given 
class to the input will produce a specific output vector, but there is no way to 
determine prior to training which specific output pattern will be produced by 
a given input vector class. Hence, the outputs of such a network must 
generally be transformed into a comprehensible form subsequent to the 
training process. This does not represent a serious problem. It is usually a 
simple matter to identify the input-output relationship established by the 
network. 

Training Algorithms 

Most of today’s training algorithms have evolved from the concepts of 
D.O. Hebb [1961] [41]. He proposed a model for unsupervised learning in 
which the synaptic strength (weight) was increased if both the source and 
destination neuron were activated. In this way, often- used paths in the 
networks are strengthened, and the phenomena of habit and learning through 
repetition are explained. 

An artificial neural network using hebbian learning will increase its network 
weights according to the product of the excitation levels of the source and 
destination neurons. In symbols; 

Wy(n + \) = w.j{n) + aOUTpUTj ' 

Where w^- (n) = the value of a weight from neuron i to neuron j prior 

to adjustment. 
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(n+1) = the value of a weight from neuron i to neuron j after adjustment, 
a = the learning- rate coefficient 
OUT, = The output of neuron / and input to neuron j 
OUTy = The output of neuron j. 




Chapter 3: Neural Network Algorithms 


The pattern recognition is a challenging problem. This problem has been 
attempted by using neural network algorithms. The associated difficulties for 
obtaining solutions have also been cited. To the best of our knowledge no 
attempt has been made to determine the capacity of various neural networks 
for recognition of all English alphabets A-Z by any single neural network 
algorithms. This chapter deals with details of existing techniques for pattern 
recognition and an attempt has been made to demonstrate the capacity of 
each algorithm. 

3.1 Bidirectional Associative Memory (BAM) 

The Bidirectional associative memory is heteroassociative, content- 
addressable memory. A BAM consists of neurons arranged in two layers say 
A and B. The neurons are bipolar binary. The neurons in one layer are fully 
interconnected to the neurons in the second layer. There is no 
interconnection among neurons in the same layer. The weight from layer A 
to layer B is same as the weights from layer B to layer A. This type of 
interconnection is called logical symmetry of interconnections [42]. It uses 
both forward and backward information flows to produce an associative 
search for stored patterns (Kosko 1987, 1988) [43-44]. Consider that stored 
in the memory are p vector association pairs known as, 

fa®.b®)l(a<»,b<«)....(a“.b<'>)) (3.1) 
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When the memory neurons are activated, the network evolves to a stable 
state of two-pattem reverberation, each pattern at output of one layer. The 
stable reverberation corresponds to a local energy minimum. The network’s 
dynamics involves two layers of interaction. Because the memory process 
information in time and involves Bidirectional data flow, it differs in 
principle from a linear associator, although both networks are used to store 
association pairs. It also differs from the recurrent auto associative memory 
in its update mode. 

3.1.1 Memory Architecture 

The basic diagram of the Bidirectional associative memory is shown in fig. 
3.1. Let us assume that an initializing vector b is applied at the input to the 
layer A of neurons. 



Layer A Layer B 


(a) 

Fig.3.1; Bidirectional Associative Memory: General Diagram 
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The neurons are assumed to be bipolar binary. The input is processed 
through the linear connection layer and than through the bipolar threshold 
function as follows, 

a'=r[Wb] (3.1a) 

Where r[-] is a nonlinear operator defined as, 

sgn(.) 0 ... 0 

0 sgn(.) ... 0 

0 0 ... sgn(.) 

This pass consists of matrix multiplication and a bipolar thresholding 

operation so that the i ' th output is 

( n, 

a,.'= sgn j for ^ = 1, 2, .. ., n (3.1b) 

Assume that the thresholding as in (3.1a) and (3.1b) is synchronous, and the 
vector d now feeds the layer B of neurons. It is now processed in layer B 
through similar matrix multiplication and bipolar thresholding but the 
processing now uses the transposed matrix w ^ of the layer B: 

b'=r[wvj (3.1c) 

Or for the j ’ th output we have 

6/=sgn , for 7 = 1,2, ...,m (3. Id) 

V M y 

From now on the sequence of retrieval repeats as in (3.1a) or (3.1b) to 
compute a”, then as in (3.1c) or (3. Id) to compute b”, etc. the process 
continues until further updates of a and b stop. It can be seen that in terms of 
recursive update mechanism, the retrieval consists of the following steps 
First Forward Pass: a‘=r[wb®J 

First Backward Pass: b^=r[wVj 
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Second Forward Pass: 


a^=r[wb'J (3.2) 


k/2 ’th Backward pass: b* = r[w'a*"' J 

Ideally, this back-and-forth flow of updated data quickly equilibrates usually 
in one of the fixed pairs (a^^, from (3.1). Let us consider in more detail 
the design of the memory that would achieve this aim. 



(b) 

Fig 3.2: Bidirectional Associative Memory: Simplified Diagram 

Fig (3.2) shows the simplified diagram of the Bidirectional associative 
memory often encountered in the literature. Layer A and B operate in an 
alternate fashion- first transferring the neuron’s output signals towards the 
right by using matrix W, and then toward the left by using the matrix WV 
respectively. 
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The Bidirectional associative memory maps bipolar binary vectors a = [a]a 2 
...aj, ai =±1 , z= 1, 2, n, into vectors b =[b/b 2 ...brrj\ h =±1, i = 1, 2,,.., 
m, or vise versa. The mapping by the memory can also be performed for 
unipolar binary vectors. The input-output transformation is highly nonlinear 
due to the threshold — based state transitions. 

For proper memory operation, the assumption needs to be made that no 
state changes are occurring in neurons of layers A and B at the same time. 
The data between layers must flow in a circular fashion: A— >B— »A, etc. The 
convergence of the memory is proved by showing that either synchronous or 
asynchronous state changes of a layer decrease the energy. The energy value 
is reduced during a single update. Because the energy of the memory is 
bounded from below, it will gravitate to fixed points. Since the stability of 
this type of memory is not affected by an asynchronous versus synchronous 
state update, it seems wise to assume synchronous operation. This will result 
in larger energy changes and, thus, will produce much faster convergence 
that asynchronous updates which are serial by nature and thus slow. 

3.1.2 Association Encoding and Decoding 

The coding of information (3.1) in to the Bidirectional associative 
memory is done using the customary outer product rule, or by adding p cross 
- correlation matrices. The formula for weight matrix is 

i=l 

Where ands are bipolar binary vectors, which are members of the 
pair. 

3.1.3 Algorithm 

Step 1: The associations between pattern pairs are stored in the memory in 
the form of bipolar binary vectors witti entries -1 and 1 . 
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fa<‘',b'‘>)l(aO>.b<») {a<«,b<'’>)) 

Vector a store pattern and is n-dimensional, b is m-dimensional which 
stores associated output. 

Step 2: Weights are calculated by 


fF = 

/=1 


Step 3: Test vector pair a and b is given as input. 

Step 4: In the forward pass, b is given as input and a is calculated as 

a = r[Wb] 

Each element of vector a is given by 


a/=sgn 




for /= 1, 2, ..., n 




Step 5: Vector a is now given as input to the second layer during backward 
pass. Output of this layer is given by 

b'= r[Wa] 

Each element of vector b is given by 




sgn 




for j = 1, 2, ..., m 


V M y 

Step 6: If there is no further update then the process stops. Otherwise step 4 
and 5 are repeated. 

3.1.4 Storage Capacity 

Kosko (1988)[44] has shown that the upper limit on the no. p of pattern 
pairs which can be stored and successfully retrieved is min (m, n). The 
substantiation for this estimate is rather heuristic. The memory storage 
capacity of BAM is 

P<min(m, n) 
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A more conservative heuristic capacity measure is used in a literature, 

3.1.5 Results 

The network has 49 neurons in first layer and five neurons in the second 
layer. With this configuration the network is capable of storing six alphabets. 
The alphabets have been represented as a pattern grid of size 7x7. From the 
pattern grid each alphabet is obtained 


Character 

Input 

Output 

A 

a = [-1-1111-1-1-11-1-1-11-11-1-1-1-1- 

1111111111-1-1-1-1-111-1-1-1-1-111-1-1-1-1- 

11] 

b = [-1-1-1-11] 

-1-1-1-11 

I 

a =[-111111-1-1-1-11-1-1-1-1-1-11-1-1-1-1-1- 

11-1-1-1-1-1-11-1-1-1-1-1-11-1-1-1-111111-1] 

b = [-ll-l-ll] 

-11-1-11 

L 

a = [ 1-1-1-1-1-1-11-1-1-1-1-1-11-1-1-1-1-1- 

11-1-1-1-1-1-11-1-1-1-1-1-11-1-1-1-1-1- 

1111111] 

b= [-111-1-1] 

-111-1-1 

M 

a = [ 1-1-1-1-1-1111-1-1-1111-11-11-111-1- 

11-1-111-1-1-1-1-11-1-1-1-1-111-1-1-1-1-11] 

b=[-lll-ll] 

-111-11 

P 

a = [lllll-l-ll-l-l-l-ll-ll-l-l-l-l-lll-l-l- 

1-11-111111-1-11-1-1-1-1-1-11-1-1-1-1-1-1] 

b = [l-l-l-l-l] 

l-l-l-l-l 

X 

a = [1-1-1-1-l-ll-ll-l-l-ll-l-l-ll-ll-i-l-i- 
1-11-1-1-1-1-11-11-1-1-11-1-1-11-11-1-1-1-1- 

11] 

b = [11-1-1-1] 

11-1-1-1 


Table 3.1: Given input and Obtained output in BAM 
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in the form of a vector with entries 1 and -1. All the alphabets in the form 
vector a and its associated output as vector b is stored. Weight matrix is 
obtained by taking inner product of a and b. The network has been tested for 
A, I, L, M, X, and P. The network gives the correct associated output when 
these characters are given at the time of testing. The result has been shown 
in table 3.1. 

3.1.6 Merits and Demerits 

In BAM, a distorted input pattern may also cause correct 
heteroassociation at the output. The logical symmetry of interconnection 
severely hampers the efficiency of BAM in pattern storage and recall 
capability. It also limits their use for knowledge representation and inference 
[45]. There is limitation on number of pattern pairs, which can be stored and 
successfully retrieved 

3.2 Hopfield Autoassociative Memory 

The Hopfield network is an associative memory. The primary function 
of which is to retrieve in a pattern stored in memory, when an incomplete or 
noisy version of that pattern is presented. 

In Hopfield model, each neuron has two states. The on state of neuron is 
denoted by the output +1 the off state is represented by -1 . A pair of neurons 
i and j in the network are connected by weight Wy, which denotes the 
contribution of output signal of neuron i to the potential acting on neuron/ 
There are two phases to the operation of the Hopfield network (i) storage 
phase and (ii) Retrieval phase. In the storage phase N- dimensional patterns 
are stored in the memory. These patterns can be retrieved during retrieval 
phase when these patterns are presented to the network as test vectors. 
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3.2.1 Algorithm 

Step 1: n-dimensional vectors ai, a 2 , a 3 ...ap are stored in the memory with 
entry +1 and -1. 

Step 2: Weights of the network is calculated as 


0 


j=l 


The weights are not updated during iterations. 

Step 3: n-dimensional probe vector x is presented to the network. The 
algorithm is initialized by, 

aj(0) = Xj forj=l,2, 3, ..., n 

Step 4: The elements of the vector a (k) is updated by 


a,(^ + l) = sgn 




Lm 


Step 5: During any iteration if 


Then there is no further iteration that is vector a becomes stable. Otherwise 
step 4 is repeated. 


Step 6: The stable state a^ed is the output vector of the network. 

afixed 

3.2.2 Storage Capacity 

The storage capacity of the Hopfield network is given by 

p 

““ 21n« 

Where Pmax is the maximum numbers of ftmdamental memory that can be 
stored, n is the dimension of vector p. 
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3.2.3 Result 

The network has two layers. There are 49 neurons in each layer. 7x7 pattern 
grid is used to represent a character. With this configuration the network is 
able to store and successfully recall six characters. Six characters Y, O, T, L, 
M, and I are stored in the network. Each is represented as 49-dimensionsal 
vectors with entry +1 and -1. Weight matrix is found from the stored 
patterns. After storage, when these characters are presented individually to 
the network they are recognized correctly. This has been shown in table 3.2. 
When A, I, L, M, X and P are stored, all these are not recognized correctly at 
the time of testing. I, L, M, X and P recognized correctly. When character A 
is presented as probe vector, the network gives output as shown in Table 3.3. 

3.2.4 Merits and Demerits 

Hopfield network recognizes well even when an input vector that is equal to 
the stored patterns plus random error is presented. The Hopfield Network 
demonstrates the power of recurrent neural processing within a parallel 
architecture. The recurrences through the thresholding layer of neurons 
eliminated the noise added to the initializing input vector [46-47]. 

Hopfield associative memory has capacity limitations. Capacity limitation 
causes convergence to spurious states. Spurious states represent stable states 
of the network that are different from the stored pattern or fundamental 
memories of the network [48]. It also causes difficulty with recovery of 
stored patterns if the patterns are closed to each other in the hamming 
distance 


33 


























3.3 Feature Recognition Neural Network 

Feature recognition neural network is also used for character recognition. 
This network learns the patterns by remembering their different segments. It 
uses recognition by parts technique by remembering the different sections of 
the pattern. Thus noise or deformation in one section of the pattern does not 
affect the overall recognition process. This is the basis of the development of 
the feature recognition algorithm [49]. 

This network has two levels. The first level detects the sub patterns. The 
second level is responsible for detecting the patterns themselves and 
provides the output class. The pattern grid distributes the inputs over the first 
level. The neurons in the first level detect the sub patterns and feed a single 
neuron in level two. That fires if all the sub patterns are detected. The 
neurons of second level fires whenever, any one of the former neuron detects 
the sub pattern. 

To recognize any pattern the network goes to following steps: 

(1) The network learns all the training patterns and remembers their sub 
patterns. 

(2) The new pattern (test patterns) is also divided in to subpattem. 

(3) Each of the subpattems is compared individually against the 
corresponding sub patterns of training patterns and suitable match is found. 

(4) Then it finds the total number of subpattem that matched for any 
pattern and finds the closest overall match, 

3.3.1 Algorithm 

Step 1: A setof training vectors 

= .a.,) forp=l,2, p 
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Is given as input where a;p=0 or 1 

n is the dimension of training vector. 

Step 2: The weights are 

,wj 

Where 1 if 1 
-1 ifa„=0 

Step 3: Each pattern is divided in to s number of subpattems which is given 
by 

■^sp ~ Fot" s 1> 2j , S 

P=l,2, ,P 

m is the dimension of vector obtained from each subpattem. 

Step 4: The weight matrix formed from subpattems is 

WsP=hsp>^2sp> 

Where Wj^p = 1 if = 1 
-1 if a^,^ =0 

For j = 1,2, m 

Step 5: For all the subpattem of every training patterns threshold is 
calculated as 

^sP=Ap(Kpy for s= 1,2, S 

p=l,2,.....P 

Step 6: Test pattern is presented to the network and stored as the vector 

at=(ati, at 2 , ...... ...atn) 

Step 7: Test pattern is divided into subpattems 

atSs= (atS], atS2, .atSm) for s= 1,2,......S 
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Step 8: Inner product of sub patterns of test patterns with sub patters of 
every stored patterns is calculated and stored in the vector p. 

P./. fors= 1,2, S 

p= 1,2, P 

Step 9: Each threshold ^^pis compared with p^pand variable parfire is set 
accordingly. 

Psp Partial firing parfiresp= 1 

Otherwise parfiresp= 0 

Step 10: overall firing of each neuron of output layer is calculated as 

Mp = Y, parfire 

5=1 

Step 11: Maximum of all fircp for p= 1,2, P is selected, which gives the 

out class p. 

3.3.2 Results 

Three layers network configuration has been taken. The output layer has 
26 neurons one corresponding to each character, 26 characters in the form of 
9x9 pattern grids are given as training pattern to the network. Each pattern is 
divided into 9 sub patterns of size 3x3. The network correctly classifies 23 
characters when they are presented individually to the network as test 
pattern. The neuron corresponding to the character no. has maximum value 
of the variable fire. The network can not differentiate O and Q from C and 
also R from P. This has been shown in table 3 .4. 

3.3.3 Merits and Demerits 

The network uses simple integer weights. It converges in a single 
iteration. It gives the results instantaneously without weighting for nay 
stabilization period convergence is guaranteed. 
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The network is complex and it involves a large no. of neurons. Although 
its structure is simpler when compared to Neocognitron. But the no. of 
neurons required is greater than as compared to other methods. But this 
demerit is overshadowed from the point of view of storage capacity. 


Character Input Pattern 
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Table 3.4: Given Input and Obtained Output in FRNN 

3.4 Hamming Network and MAXNET 

This is a two-layer classifier of binary bipolar vectors. The first layer of 
hamming network itself is capable of selecting the stored class that is at 
minimum HD value to the test vector presented at the input. The second 
layer MAXNET only suppresses outputs other than the maximum output 
node of the first layer[50]. 

The hamming network is of the feed forward type. The number of output 
neurons in this part equals the number of classes. The strongest response of a 
neuron of this layer indicated the minimum HD value between the input 
vector and the class this neuron represents. The second layer is MAXNET, 
which operates as a recurrent network. It involves both excitatory and 
inhibitoiy connections. The block diagram of this network is shown in the 
fig 3.3; it is a minimum hamming distance classifier which selects the stored 
classes that are at a minimum HD value to the noisy or incomplete argument 
vector presented at the input. 
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Fig.3.3; Block diagram of the minimum HD classifier 

This selection is essentially performed solely by the Hamming network. The 
Hamming network is of the feed forward type and constitutes the first layer 
of the classifier. The p- class-hamming network has p output neurons. The 
strongest response of a neuron is indicative of the minimum HD value 
between the input and the category this neuron represents. The second layer 
of the classifier is called MAXNET and it operates as a recurrent recall 
network in an auxiliary mode. Its only function is to suppress values at 
MAXNET output nodes other than the initially maximum output node of the 
first layer. The proper part of the classifier is the Hamming network 
responsible for matching of the input vector with stored vector. The 
expanded diagram of the hamming network for classification of bipolar 
binary n— tuple input vectors are shown in fig 3.4. The purpose of the layer is 
to compute, in a feed forward manner, the values of (n-HD), where HD is 
the hamming distance between the search argument and the encoded class 
prototype vector. Assume that the n-tuple prototype vector of the mth class 

is for m=l ,2,3 p., and the n-tuple vector is x. Note that the entries of 

the weight vectors Wm defined as, 

w„ = [w„iw „2 .w„J, for m= 1, 2... p Connects inputs to the mth 

neuron, which performs as the class indicator. 
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Fig. 3.4: Hamming Network for n-bit bipolar binary vectors 
representing p classes: (a) Classifier Network and (b) Neurons 
Activation Function 

A vector classifier with p outputs, one for each class, can be conceived 
such that mth output is one if and only if x= s^^l This would require that the 

weights be W(in) = s^™^ . The classifier outputs are , x*s^^^ x*s^™\ 

x^s^*’^ When x = s^'”^ only the m’th output is n, provided the classes 

differ fi'om each other, and assuming ±1 entries of x. The scalar product of 
the vectors has been used here as an obvious measure for vector matching. 
The scalar product x*s^™^ of two bipolar binary n-tuple vectors can be written 
as the total number of positions in which the two vectors agree minus the 
number of positions in which they differ. Understandably, the number of 
positions in which two vectors agree is n-HD. The equality is written, 

(3.4.1) 

This is equivalent to 

(3.4.2) 
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We can now see that the weight matrix Wh of the hamming network can be 
created by encoding the class vector prototypes as rows in the form as 


below, 


W^=- 


(,( 1 ) 


„( 1 ) 

.( 2 ) 


.( 1 ) 

n 

( 2 ) 


L^l 


iP) ^iP) 


.(P) 


(3.4.3) 


Where the V 2 factor is convenient for scaling purposes. Now the network 


with input vector x yield the value of 


x's' 

2 


act the input to the node m for 


m = 1, 2, ..., p. Adding the fixed bias value of n/2 to the input of each 
neuron results in the total input netm. 

+|, for m= 1,2, ,p (3.4.4) 

Using the identity (3.4.2), netm can be expressed as, 

net„ =n- i/D(x,s^'"^ ) (3.4.5) 

Let us apply neurons in the hamming network with activation function as in 
fig 3.4(b). The neurons need to perform only the linear scaling of (3.4.5) 
such that f{net^) = {\.ln)net^, for m=l,2,...,p. Since inputs are between o and 

n, we obtain the outputs of each node scaled down to between 0 and 1. 
Further more, the number of the node with the highest output indeed 
indicated the class number to which x is at the smallest HD. A perfect match 
of input vector to class m, which is equivalent to the condition HD=0, is 
signaled by f{net„)=\. An input vector that is the compliment of the 

prototype of class m would result in f{mt^)=0. The response of the 
hamming network essentially terminated the classification in which only the 
first layer fi*om fig. 3.3 computes the relevant the matching the score values. 
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As, seen, the classification bv the hamming i ■ 

, , , . ^ ^ hamming network is performed in a feed 

forward and instantaneous manner. 

MAX>JET needs to be employed as a second layer only for cases in 
which an enhancement of the initial dominant response of the m’th node is 
requned. As a result of MAXNET recunent pmcessing, the m>th node 

responds positively, as opposed to all remaining nodes whose responses 
should have decayed to zero. As shown in fig. 3.5, 



Fig. 3.5 MAXNET for p classes: (a) Network Architecture and 
(b) Neurons Activation Function 

MAXNET is recunent network involving both excitatory and inhibitoiy 
connection. The excitatory connection within the networic is implemented in 
the form of a single positive self - feed back loop with a weighting 
coefficient of one. All the remaining connections of this fully coupled feed 
back network are inhibitoiy. They are represented as M-1 cross feed back 

synapses with coefficients -e fi-om each output. The second weight matrix 
Wm of size pxp is thus of the form. 
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(3.4.6) 


w, 


M 


-E 

-S 


-e -s -s 1 


Where s must be bounded 0< 8<l/p. the quantity 8 can be called the 
literal interaction coefficient. With the activation function as shown in fig. 
3.5 and the initializing inputs fulfilling conditions, 

<1, for 1 = 1 , 2... p 


the MAXNET network gradually suppresses all but the largest initial 
network excitation. When initialized with the input vector y®, the network 
starts processing it by adding positive self feed back and negative cross feed 
back. As a result of a number recurrence, the only unsuppressed node will be 
the one with the largest initializing entry y®ni. This means that the only 
nonzero out put response node is the node closest to the input vector 
argument in HD sense. The recurrent processing by MAXNET leading to 
this response is 

y“=rKy‘J (3.4.7) 

where Pis a nonlinear diagonal matrix operator with entries /(•) given 
below: 


f(net) = 


Q,net < 0 
net, net > 0 


(3.4.8) 


Each entry of the updated vector decreases at the k’th recursion step of 
(3.4.7) under the MAXNET update algorithm, with the largest entry 
decreasing slowest. This is due to the conditions on the entries on the entries 
of matrix as in (3.4.6) specifically, due to the condition 0<8<l/p. 


46 


Assume thatj® = /^m. During the first recurrence, all 

entries of y* are computed on the linear portion of f(net). The smallest of all 
y® entries will first reach the \&wQ\f(net)=0, assumed at the k’th step. The 
clipping of one output entry slows down the decrease of y^^Tn all forth- 
coming steps. Then, the second smallest entry of y® reaches /(her; =0. The 
process repeats it self until all values except for one, at the output of the 
m’th node, remain at nonzero value. 


3.4.1 Algorithm 

Step 1: Consider that patterns to classified are ai, ai ... ap, each pattern is n 
dimensional. The weights connecting inputs to the neuron of hamming 
network is given by weight matrix. 


" 2 




M2 


a 


^21 ^22 


\n 




(3.4.9) 


^pl ^pn\ 

.. 

ill:-' ■ ' '' : ' 

I"'." 

I Step2: n-dimensional input vector x is presented to the input. 

Step3: Net input of each neuron of hamming network is 

form=l,2, ,p 

Where n/2 is fixed bias applied to the input of each neuron of this layer. 
Step 4: Output of each neuron of first layer is 

finely ) = - net^ for m=l ,2 . . . . . .p 

n 
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Step 5: Output of hamming network is applied as input to MAXNET 

j'’=/(ner„) 

Step 6: Weights connecting neurons of hamming network and MAXNET is 
taken as 




1 ~s -s 
-s 1 -s 
-e 


-e 

-£ 


l-s ~e 1 J 

Where the value of s must be bounded 0< s<l/p. the quantity s is called the 
lateral interaction coefficient. Dimension of Wm is pxp 
Step 7: The output of MAXNET is calculated as, 

yfe+i k=l, 2, 3 denotes the no of 

iterations. 

Where r is a nonlinear diagonal matrix operator with entries / (•) given 


as, 


finet) = 


<0 

net, net > 0 


Step 8: Each element of decreases with increase of number of iterations 
(k). Maximum of y*""* gives the class to which test vector x belongs. 

3.4.2 Storage Capacity 

The network has stringent capacity limits. The hamming network is capable 
of classifying all the stored patters during testing. 


48 


3.4.3 Result 

The network has 26 neurons in both layers. All the 26 characters are 
stored as 49 dimensional vectors whose entries are +1 and -1. +1 represents 
an on pixel and -1 represents an off one. The classifier correctly classifies 
each of 26 characters when these are presented individually to the network 
for testing. The value of lateral interaction coefficient used is 0.005. The 
output of neuron corresponding to the character number is highest. 

3.4.4 Merits and Demerits 

The network architecture is very simple. This network is a counter part of 
Hopfield auto associative network. The advantage of this network is that it 
involves less number of neurons and less number of connections in 
comparison to its counter part. There is no capacity limitation. 

The hamming network retrieves only the closest class index and not the 
entire prototype vector. It is not able to restore any of the key patterns. It 
provides passive classification only. This network does not have any 
mechanism for data restoration. 

3.5 Adaptive Resonance Theory 1 (ARTl) 

This network was developed by Carpenter and Grossberg [51,52]and is 
called an adaptive resonance theory 1 (ARTl) network. It serves the purpose 
of cluster discovery. This network learns the clusters in an unsupervised 
mode. ARTl network can accommodate new clusters without affecting the 
storage or recall capabilities for clusters already learn. 

The network produces the clusters by itself, it such clusters are identified 
in input data, and stores the clustering the information about patterns of 
features without a priori information about the possible no. and type of 
clusters. Essentially the network “follows the leader” after it originates the 
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first cluster with the first input pattern received. It than creates the second 
cluster if the distance of the second pattern exceeds a certain threshold, 
otherwise the pattern is clustered with first cluster. This process of pattern 
inspection followed by either new cluster origination or acceptance of 
pattern to the old cluster is the main step of the ARTl network production. 

The central part of ARTl network computes the matching score 
reflecting the degree of similarity of the present input to the previously 
encoded clusters. The topmost layer of the network does this. This part of 
network is functionally identical to the hamming network and MAXNET, 
The initializing input to the m’th node of MAXNET is the familiar scalar 
product similarity major between the input x and the vector Wm. We thus 
have the initial matching scores of values 

>'m Form=l,2,.M (3.5.1) 

Where w„ =[w,„,w 2 „,.....w„„|. 

Note that the double subscript convention for weights Wyis not followed. 
The first weight index denotes input node number “from”; the second index 
denotes node number “to”. This is to conform to common notation used in 
the technical literature for this network. The activation function ffnet) for the 
MAXNET neuron is shown in fig (3.4a) and given by fig(3.4b). It is also 
assume that a unit delay element stores each MAXNET neuron output signal 
during the unity time A during recursions before it arrives back at the top 
layer node input. The input of the topmost layer is initialized with vector y“, 
entries of which are computed as matching scores (3.5.1), and thereafter the 
layer undergoes the recurrent updates. We thus have for this portion of the 
network 

y‘*'=rKy*J (3.5.2) 
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The initializing matching scores vector y^=nef for (3.5.2) and for the top 
layer recurrences is given by simple feed forward mapping, 

/=Wx (3.5.3) 

Where W is the bottom to top processing weight matrix containing entries 
Wj,as follows, 


W 


w,- 




21 




rtl 


^12 ^22 


W. 


nl 


As for the MAXNET network, the single nonzero output for a large enough 
recursion index k, is produced by the j’th top layer neuron. For this 
winning neuron we have, 



W::X: = HiaX 
7«=1,2,...A/ 





\ 

y 


(3.5.4) 


3.5.1 Algorithm 

Step 1; Proper value of vigilance threshold p is set, and for the n-tuple input 
vectors and M top layer neurons the weights are initialized. The weight 
matrices W, V are of dimension (Mxn) and each is initialized with identical 
entries. 

wJ-l-l 

_! + «_ 

V = [l] 

0<p<l 

Step 2; Binary Unipolar input vector X is presented at input nodes, 

Xi= 0,1, fori=l,2, ...,n 
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step 3: Matching scores are computed for each patterns. The output of the 
forward network is calculated as, 


n 

yl = ni=l,2, ,M 


The best matching existing cluster is found according to maximum criteria. 

y'^i = max (y“) 

Step 4: The vigilance test for winning neurony is done 

Pll ^‘=1 

where p is the vigilance parameter and the norm ||x|| is defined for the 
purpose of this algorithm as follows: 

A n 

H=Ikl 

if the vigilance test is passed, the algorithm goes to step 5. If the test|S failed 
and top layer has more than a single active node then the algorithm goes to 
step 6, otherwise to step 5. 

Step 5: In this step, the weight matrices are updated for neuron j passing the 
vigilance test. So updates are only for entries (i,j), where y- 1,2,. . ., M 


wJt + \) 


0-5 + 2v^.(0^, 


v^.(t + l) = x,v^.(0 

After updating of weights the algorithm goes to step 2. 

Step 6: The neuron is deactivated by setting ypO.This node does not 
participate finther in the clusters search. The algorithm goes back to step 3 
and attempts to find a new cluster other than j for the pattern under test. 
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Fig 3.6; Flowchart of the ARTl Encoding Algorithm for Unipolar 
Binary Inputs 

3.5.2 Storage Capacity 

For patterns of similar shape the ARTl network is not able to classify them 
in two different clusters. Similar patterns are grouped into same class. 
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3.5.3 Result 

ARTI network has been used to classify 26 alphabets. 26 neurons are taken 
in the top layer. Each of the 26 alphabets is stored as 49 dimensional vectors 
with entries 0 and 1 with vigilance threshold of 0.95. the 26 alphabets are 
classified in to 14 classes. Characters which are slightly similar in shape are 
clustered by the network in the same class. The way in which characters are 
classified has been shown in table 3.5. The network has been tested for 
different vigilance factors. It has been observed that with increase of 
threshold, number of classes in which these characters are classified 
increases. The effect has been shown in the table 3.6. 

3.5.4 Merits and Demerits 

Controlled discovery of clusters are one of the properties of ARTI network. 
The ARTI network is capable to accommodate new patterns without 
affecting the recall capabilities for patterns already learned. The weights for 
clusters already stored are not modified. 

The ARTI network is not capable of classifying characters having slightly 
sunilar shape. Classification depends a lot on proper value of vigilance 
factor p. 


Characters 

Cluster Number 

A 

1 

B 

6 

C 

5 

D 

10 

E 

4 

F 

4 

G 

5 
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Vigilance 

factor 

Number 

of Classes 

Characters Classified in Same Class 

.0.80 

9 

S; E; I; T; J; F, K; C, D, G, L, 0, Q, U; N, V, W, 

P;X,Y,Z;A,B,H,M,R 





IH 


0.95 

14 

A; E, F; C, G; B, R; I, J, T; L; D, 0, U; P; S; V; 

Q, W;X,Z; Y;H,K,M,N 

0.98 

14 

A; E, F; C, G; S; I, J, T; L; U; P; R; V; D, Q, W; 

X,Z; Y; B,H, K,M,N,0 

0.99 

14 

A; E, F; C, G; S; I, J, T; L; U; P; R; V; D, Q, W; 

X,Z;Y;B,H,K,M,N,0 


Table 3.6; Effect of Vigilance Factor on Classification in ARTl 

Algorithm 

Note:”;” is used to separate the characters clustered in same class 

3.6 Back Propagation Algorithm 

Multilayer feed forward network can be applied to a variety of 
classification and recognition problems [53, 54]. In back propagation 
algorithm, it is not necessary to know the mathematical model of the 
classification of recognition problem to train and then recall information 
from the network. In this algorithm, if suitable network architecture is 
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selected and sufficient training set is presented, the network may give 
acceptable solution. 

The back propagation algorithm allows input/output mapping within 
Multilayer feed forward neural networks. Input patterns are submitted during 
training sequentially. If classification of a submitted pattern is found to be 
erroneous, then the weights and threshold are adjusted so that current least 
mean square classification error is reduced. The input/output mapping, 
comparison of target and output values and adjustments are continued until 
all the training patterns are learned within acceptable error. During the 
classification phase, the trained neural network operates in a feed forward 
manner. The weight adjustments by the learning rule propagate backward 
from the output layer through hidden layer to the input layer. 

3.6.1 Algorithm 

Consider that there are P training pairs {ai, bi; a 2 , b2 ...ap, bp} aj is the 
input vector and bi is target output vector. 

Step 1: Weights W and V are initialized at some random value. V is the 
weight matrix between input and hidden layer; W is the weight matrix 
between hidden and output layer. 

Step 2: Pattern Z is submitted to the network and responses of neurons of 
hidden and output layer are calculated, y is the output hidden layer of 
neurons and o is the output of output layer neurons. 

y=r[v,] 

o^riwj 

Where r is a nonlinear operator. 
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Step 3: The error is calculated as 

Step 4: Errors ^„and 5 ^ are calculated as 

■5. 

.' 1,2 
fy = 2^-yj \ 

Step 5: weight s of output layer are adjusted by 

jv<-fF+T/sy 

Step 6: weight of hidden layer are adjusted by 

F<-F + ?jSy 

Step 7: If there are more patterns in the training set then the whole process 
from step 2 is repeated. If there are no more patterns left then it is checked 
whether E<Emax- If yes then the process is complete. If no then a training 
cycle is started from step 2. 

3.6.2 Result 

The network architecture used was 35-16-26. Characters are represented 
by a pattern grid of 5x7. All the 26 characters were stored as 35-dimensional 
vector with entries 0 and 1. The target output vector is also stored which is 
identity matrix of size 26x26. An error goal of 0.01 was taken. The network 
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was trained for maximum 5000 epochs. After training the network was 
capable of recognizing all the characters correctly. 

3.6.3 Merits/ Demerits 

This method provides accurate input -output mapping. The advantage of 
this method is that each adjustment step is computed quickly. All the 
patterns need not be presented simultaneously. Each step is computed 
without finding an overall direction of the descent for the training cycle. 

Multilayer feed forward neural networks trained by back propagation 
algorithm are slow to learn. 

3.7 Quickprop 

Standard Backpropagation calculates the weight change based upon the 
first derivative of the error with respect to the weight. If second derivative 
information is also available then better step of the optimum step direction 
can be found. Backpropagation networks are also slow to train. Quickprop is 
a variation of standard Backpropagation, to speed up training. 

The Quickprop modification is an attempt to estimate and utilize second 
derivative information [Fuhrman 1988]. Quickprop requires saving the 
previous gradient vector as well as previous weight change. The calculation 
of weight change uses only information associated with the weight being 
updated. 

V^ («) 

[VviA. {n-\)-VyVy («)]AWj. («-1) 

Where («) = the gradient vector component associated with weight wy 
in Step n. 
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S 7 w,j(n-V)= the gradient vector component associated with weight Wy in 
previous step. 

Awy (n - 1) = Weight change in (n- 1 )* step. 

A maximum growth factor p is used to limit the rate of increase of the step - 
size like 

If Awy(n)>^yin-V), 

Then Awyin)> juAwy(n-V) 

Fahlman suggested an empirical value 1.75 for p. 

There are some complications in this method. First is the step-size 
calculation that requires a previous value, which is not available at the time 
starting. This is overcome by using standard back propagation method for 
the weight adjustment. The gradient descent weight change is given by, 

Wy(n + l) = Wy(n)-rj'Vwy 

Value of Ti is taken suitably small. 

Second problem is that the weight values are unbounded. They become so 
large that they cause overflow in the computer. Its solution is that each 
calculated slop is multiplied by a factor less than 1.0, which reduces the rate 
of increase of the weight. 

3.7.1 Result 

Quickprop algorithm has been tested for 26 characters. The network has 
been trained for a 80 epochs. The most suitable network architecture found 
was 49-16-26. All the 26 characters in the form of 49-dimensional vectors 
and target output as 26x26 identity matrix are given during training. The 
network gives output equal to the target output. The value of learning rate 
used is 0.8 and a momentum factor of 0.1 is taken. The no. of characters 
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recognized correctly varies with a learning rate. The effect is shown in table 
3.7. There is no effect of varying momentum factor. Varying momentum 
factor between 0. 1-0.9 for learning factors 0. 1-1.0 has tested this. 


Learning Rate 

■ 

Number of Characters Recognized 

correctly 

0.1 

1 

0.2 

14 

0.3 

24 

0.4 

19 

0.5 

17 

0.6 

25 

0.7 

6 

0.8 

26 

0.9 

14 

1.0 

13 


Table 3.7; Effect Of Learning Rate On Number Of Characters 

Recognized Correctly. 


3.7.2 Merits and Demerits 

Quickprop reduces the training time of back propagation it hardly takes 2 
or 3 seconds. 

The selection of network architecture should be carefully done. Its 
performance is highly problem dependent. It requires several trials to set 
acceptable values of parameters. One problem is that weight values are 
unbounded so that they cause overflow problem in the computer. 
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3.8 Kohonen Self-Organizing Map 


Kohonen network is based on unsupervised learning of clusters. The term 
self-organization means the ability to learn and organize information by it 
self. Thus a self-organization network performs unsupervised learning. 

Kohonen network consists of single layer of neurons plus an input layer. 
Each neuron receives input from other neurons within layers. In Kohonen 
network, the weights are initialized to some random values. The learning in 
this network is winner take all learning (WTA).The network to be trained is 
called the kohonen network [55]. The neuron with largest activation value is 
declared as winner. Only this neuron gives an output, all other neurons are 
suppressed to zero activation level. The output of neuron is given as 
inhibitory input to other neurons but as excitatory to its neighbors. Thus 
weight is updated not only of winner neuron but also of other’s. This type of 
competition within a layer is called lateral inhibition. The neighborhood size 
decreases with learning. The number of learning neuron also decreases in 
each iteration and finally only the winner neuron learns. 

3.8.1 Algorithm 

Step 1 : The training set consists of p patterns are presented to the input. 

{ai,a2, •%)} 


Each pattern is n-dimensional. 

Step 2: Weight Matrix is initialized as 



wl 



w = 

w' 

, Where w, = 

W/2 
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Step 3 : Neighborhood distance of neurons is fixed and neighborhood matrix 
is taken. 

Step 4: The network is trained using weight matrix W, neighborhood matrix 
M and the stored patterns. 

Step 5: For each of the training pattern, winner neuron is found. Training 
pattern is presented to the network one by one. The weight adjustment is 
done by selecting Wi such that 

||a -w„\\ = - w,.||} for i=l,2, p 

The index m denotes the winning neuron number corresponding to 
which is closest to a. 

Step 6: The weight of the winner neuron is updated such that 

Aw'„=aia-wJ 

Where a is learning constant. 

The remaining weight vector is left as it is. So 

fori^^m 

a is reduced during training step and the learning slows down. 

Step 7: The test vector x is presented to the network after training the 
network is in winner -take-all modes. The response is computed from 

y=rK] 

r is an operator with diagonal entries which operates on components of 
Wx. Dimension of y is p. 

Step 8; Largest output found as 

=max(y,,y2> ■•■■•■■yp) 

y^ is set to 1 and for i?^m yi =0. So, the test vector is identified as belonging 
to cluster m. 
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3.8.2 Result 


Training set consists of 26 characters each represented in the form of 
pattern grid of 7x7. 1 is used to represent an on pixel and zero for an off 
pixel. These 26 characters are classified into 23 classes using 42 neurons. 
The network has been trained for 5000 epochs. It has been observed that no. 
of neurons influences the no of classes. The effect of increasing numbers of 
neurons is given in table 3.8. 


Number of Neurons 

Number of Classes 

26 

20 

28 

20 

30 

21 

32 

22 

34 

22 

36 

23 

38 

23 

40 

23 

42 

24 

44 

24 


Table 3.8 Effect of number neuron on the classification in Kohonen 


network 

3.8.3 Merits 

The neuron’s activation function does not play any influence on the 
performmiee of the network. 
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3.9 Neocognitron 

The Neocognitron is self organized by unsupervised learning and 
acquires the ability for correct pattern recognition. For self-organization of 
network, the patterns are presented repeatedly to the network [56, 57]. It is 
not needed to give prior information about the categories to which these 
patterns should be classified. The Neocognitron itself acquires the ability to 
classify and correctly recognize these patterns, according to the differences 
in their shapes. 

The network has a Multilayer structure. Each layer has two planes. One 
called s-plane, consists of s units. Other the c-plane consists of c units. The 
units can have both excitatory and inhibitory connections. The network has 
forward connections from input layers to output layer and backward 
connections from the output layer to the input layer. The forward signals are 
for pattern classification and recognition. The backward signals are for 
selective attention, pattern segmentation and associated recall. 

3.9.1 Algorithm 

Let u(l), u(2), u(N) be the excitatory inputs and v be the 

inhibitory input. The output of the s-cell is given by [58] 

l + 5]a(v)M(v) 

V=gl J 

l + 6v 

Where a(v) and b represent the excitatory and inhibitory interconnecting 
coefficients respectively. 

The function ] is defined by 

^[jc] = jc x>0 

o x< 0 
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Let e be the sum of all excitatory inputs weighted with interconnecting 
coefficients and h is the inhibitory input multiplied by the interconnecting 
coefficients. 

N 

e = '^a(v)u{v) 

v=l 

h = b.v 


Therefore w can be written as, 

e-h 

T+i 

Whenh«l, w = q){e-h) 


w = m 


1 + e 
l + h 


:<P 


In Neocognitron, the interconnecting coefficients a(v) and b increases as 
learning progresses. If the interconnecting coefficient increases and if e»l 
andh»l, 


w = m 



Where the output depends on the ratio e/h, not on e and h. Therefore, if both 
the excitatory and inhibitory coefficients increase at the same rate then the 
output of the cell reaches a certain value. 

Similarly the input and output characteristics of c-cell is given by, 

y/\x] = — ^ X >0 

a + x 


0 X <0 

Where a is a positive constant which determines the degree saturation of the 
output. 

The output of the excitatory cells s and c are connected only to the 
excitatory input terminals of the other cells. Vj and Vc cells are inhibitory 
cells, whose output is connected only to the inhibitory input terminals of 


66 



other cells. A Vs -cell gives an output, which is proportional to the weighted 
arithmetic mean of all its excitatory inputs. A Vc -cell also has only 
inhibitory inputs but its output is proportional to the weighted root mean 

square of its inputs. Let u(l), u(2), u(N) be the excitatory inputs and 

c(l), c(2), c(n) be the interconnecting coefficients of its input terminals. 

The output w of this Vc -cell is defined by 

w = j^c(v)u^ 

3.9.2 Storage Capacity 

By increasing number of planes in each layer the capacity of the 
network can be increased. But by increasing no of planes in each layer, the 
computer caimot simulate because of the lack of memory capacity. 

3.9.3 Results 

A nine layered network Uo, Usi, Ud, Us2, Uc2, Us3, Ucs, Us4, and Uc4 
is taken. So the network has 4 stages proceeded by an input layer. There are 
21 cell planes in each layer UsrUs 4 , and 20 cell planes in Uci-Uc 4 , The 
parameter are listed in table 3.9. As can be seen from the table total number 
of c-cells in layer Uc 4 is 20 because each c-plane has only one cell. The 
number of cells in a connectable area is always 5x5 for every s-layer. Hence 
number of excitatory input to each s-cell is 5x5in layer Usl, and 5x5x20 in 
layers Us2, Us3 and Us4 because these layers are preceded by c-layers 
consisting of 20 cell planes in each. 

During learning five training patterns 0,1, 2, 3 and 4were presented repeatedly 
to the input layer. After repeated presentation of these five patterns the 
Neocognitron acquired the ability to classify these patterns according to the 
difference in their shape. Each pattern is presented five times. It has been 
observed that a different cell responds to each pattern. 
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Layer 

Number of 

Excitatory 

Cells 

Number of 

Excitatory 

input 

Interconne 

ctions per 

cell 

Size of an 

s-column 

(No of cells 

per s- 

column) 

ri 

q/ 

mm 






mm 

19x19x21 

5x5 

5x5x20 

4.0 

1.0 

Ucl 

15x15x20 

5x5 


I. 


Us2 

13x13x21 

5x5x20 



12.0 

Uc2 

11x11x20 

5x5 





9x9x21 

5x5x20 

5x5x20 

1.5 

12.0 


7x7x20 

5x5 




Us4 

3x3x21 

5x5x20 

3x3x20 

1.5 

12.0 

Uc4 

20 

3 x3 





Value of a = 0.75 

Table 3.9: Architecture and values of parameters used in Neocognitron 

3.9.4 Demerits 

The Network architecture is very complex. It involves a large no of neurons. 

3.9.5 Conclusions 

The algorithms of various neural networks that can be used for character 
recognition has been studied. TTie capacity of storing and correctly 
recognizing all the 26 English alphabets has also been found. It has been 
observed that only three algorithms are capable of storing all the 26 
alphabets. These three are: (1) Hamming Network and MAXNET, (2) 
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Backpropagation Algorithm and (3) Quickprop. Among these the 
performance of Backpropagation and Quickprop largely depends upon 
network architecture. Hamming network does not have this problem. 
However, it only gives the class index, not the entire prototype vector. 
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Chapter 4: Different Analysis Criterion 


4.1 Introduction 

The various neural network algorithms differ in their learning mechanism. 
Some networks leam by supervised learning and in some learning are 
unsupervised. The capacity of storing patterns and correctly recognizing 
them differs for different networks. Although if some of the networks are 
capable of storing same number of patterns, they differ in complexity of 
their architecture. Total number of neurons needed to classify or recognize 
the patterns are different for various algorithms. Total number of unknowns 
is also different for various algorithms. The networks are compared on the 
basis of these. In this chapter, the performance of various algorithms under 
different criteria is presented. 

These criteria are: 

1. Noise in Weight: Noise has been introduced in two ways, 

(i) By adding random numbers to the weight 

(ii) By adding a constant number to all weights of the network 

2. Noise in Input: The network is trained with characters without noise. 
Then by presenting each of the character, the network is tested. It has been 
observed that the number of characters recognized correctly differs for 
various algorithms. Noise is introduced randomly to the input vector at the 
time of testing and its effect has been observed on various algorithms. This 
has been done by adding random numbers to the test vector. 

3. Loss of Connection: In every network neurons are interconnected by 
some weight. If we loose any connection by equating the weight to zero, 
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then how it is going to influence the network’s classification or recognition 
ability has been studied. For every network, the number of connections that 
can be lost such that the classification or recognition is not affected has been 
found. For some network algorithm, it has been observed that the no varies 
from character to character 

4. Missing Information from Test Pattern: The characters are 
presented in the form of pattern grid. If some of information is missed than 
how it is going to influence has been observed. The missing information 
means that some of the on pixels in the pattern grid are made off. For all the 
algorithms, the maximum number of pixels that can be made of so that the 
character is recognized correctly is found. This number varies from character 
to character for each algorithm. Which pixel is made off i.e. from where 
information is missing also matters? 

5. Adding Information to Test Pattern: Adding information means 
some of the off pixels in the pattern grid are made on. The number of pixels 
that can be made on so that the character is recognized correctly is found. 
Like missing information, this also varies from character to character for 
each algorithm. 

4.2 Effect of Noise in Weight on Various Algorithms 
4.2.1 When Weights Are Varied Randomly 
(i) Bidirectional Associative Memory 

When random numbers are added in the weight matrix, only character A 
was recognized correctly. 

When random numbers are added after dividing by 10 then all the 6 
characters are recognized correctly. 
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(ii) Hopfield Auto associative Memory 

When random numbers are added in the weight matrix, the characters are 
not recognized at all. Only dots appear for every character. 

When 100 divide random numbers then all the 6 characters are 
recognized correctly. 

(iii) Feature Recognition Neural Network 

WTien random numbers are added in the weight matrix, the network does 
not work. 

(iv) Hamming Network and MAXNET 

When noise is introduced by adding random numbers to the weights 
connecting inputs to neurons of Hamming network, all the characters are 
recognized correctly. But if random numbers are added to the weights 
connecting neurons of Hamming network and MAXNET, MAXNET does 
not work. Hamming network gives correct result but the final out put of 
MAXNET is wrong. 

(v) Backpropagation Algorithm 

When random numbers are added to the weight matrices of all layers, 21 
characters are recognized correctly. If at one time, noise is introduced in 
weight only one layer then all the characters is recognized correctly. 

(vi) Quickprop 

When random numbers are added in the weight between input neurons 
and hidden neurons at 80 epochs not a single character is recognized 
correctly. When number of epochs is increased to 85, 11 characters are 
recognized correctly. For 100 epochs 17 characters are recognized correctly. 

When random numbers are added to the weights connecting neurons of 
hidden layer and output layer for 80 epochs, actual ou^^ut is equal to the 
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target output for 23 characters. When epochs are increased to 85, 24 
characters are recognizing correctly. 

When random numbers are added to both the layers at the same time, 
variation in number of characters recognized correctly with increase of 
number of epoch has been shown in table 4. 1 below. 


Number of Epochs 

Characters Recognized 

Correctly 

80 

8 

85 

12 

90 

13 

95 

16 


Table 4.1: Effect of Number of Epochs on the Number of Characters 


Recognized Correctly 


Algorithms 

Range of % increase of 

weights for which does 

work properly 

Range of % increase of 

weight for which there is 

no effect 

BAM 

0.58-49.7 

0.058-4.97 

Hopfield Auto associative 

Memory 

0.015-99.93 

0.0015-0.99 

Hamming Network and 

MAXNET 

N.A. 

0.015-99.93 

Backpropagation 

Algorithm 

0.0485-53.85 

0-10.77 

Quickprop 

0.12-55.46 

N.A. 

ARTl 

0-7.5 

N.A. 

KohonenSOM 

N.A. 

0-70 


Table 4.2: For each algorithm: effect of % increase in weight 
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(vii)ARTl 

When random numbers are added in weights 26 characters are classified 
into 7 classes. 

(viii) Kohonen SOM 

There is no effect of adding random numbers to the weights. With 42 
neurons 26 characters are clustered into 23 classes. 

(xi) Neocognitron 

It does not work properly after introducing noise in weights. Table 4.2 
shows the range of percentage increase in weight for each algorithm. 

4.2.2 By adding constant number to all the weight of network 

(i) Bidirectional Associative Memory 

When 0.1 is added to all the weights characters A, L, X and P are 
recognized correctly. I and M are never recognized even if a very small no. 
(0.0001) is added. 

(ii) Hopfield Auto associative Memory 

When 0.0065 is added to all the weights, characters stored in the network 
are recognized correctly. 

(iii) Feature Recognition Neural Network 

It does work even after adding very small number also. 

(iv) Hamming Network and MAXNET 

All the characters are recognized correctly, if we add 3.5 to the weights 
between input layer and neurons of hamming net part. But if we add any 
number to the weights connecting neurons of hamming net and MAXNET 
then the network does not work at all. 
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(v) Backpropagation Algorithm 

There are three weight matrices. Weight connecting inputs to the input 
layer, input layer to hidden layer, and hidden layer to output layer. 

If we add 0.08 to all the weights, then it does not have any effect on the 
network. If we increase weights of only one layer at one time then a large 
number can be added. It has been observed that if 0.4 is added to any of 
three weights at one time, then the network works as before. 

(vi) Quickprop 

If we add 0.0002 to all the weights then for ail the 26 characters we get 
output equal to the target output. If this number is increased then for some of 
the characters we do not get output same as target output. 

(vii) ARTl 

When adding 0.01 increase weight, then we get 14 numbers of classes as 
before. Only effect is that for characters H, K, M and N neuron number 25 
wins instead of 26. 

(viii) Kohonen SOM 

If we add 0.5 to all the weights, then it does not have any effect, we get 
23 classes as before. 

4.2.3 Effect of Noise in Inputs on Various Algorithms 
Noise is introduced in the input by adding random numbers. 

(i) Bidirectional Associative Memory 

When random numbers is added to the characters A, L, I, X, M and P the 
network is able to correctly recognize only A, L, X and P. 

(ii) Hopfield Auto associative Memory 

Hopfield network recognizes correctly all the stored characters even after 
introducing noise at the time of testing. 
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(iii) Feature Recognition Neural Network 

This network does not work after introducing noise randomly in the test 
vector. 

(iv) Hamming Network and MAXNET 

After introducing noise in test vector, the network correctly recognizes all 
the 26 characters. 

(v) ARTl 

When noise is introduced in the input vector at time of testing, all the 
characters are clustered in to same class. 

(vi) Kohonen SOM 

26 characters are classified into 18 classes, when noise is introduced by 
adding random number to the vectors representing them. 

4.2.4 Loss of Connection 

In the network, neurons are interconnected and every interconnection 
has some interconnecting coefficient called weight. If some of these weights 
are equated to zero then how it is going to effect the classification or 
recognition, is studied under this section. The number of connections that 
can be removed such that the network performance is not affected has also 
been found out for each algorithm. 

(i) Bidirectional Associative Memory: If the weight connecting 4 input 
neurons to all the output neurons is equated to zero, network 
performance is not affected. 

(ii) Hopfield Auto associative Memory: If connection of input neuron’s 
to all the output neuron is removed, and the pixel corresponding to that 
neuron number is off than it makes no difference. But if that pixel is on, 
in the ouftjut that becomes off. 
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(iii) Feature Recognition Neural Network: After loosing 234 connections 
between layer one and layer two the network performs as before. 

(iv) Hamming Network and MAXNET: If we loose connection between 
Hamming net to MAXNET, than the network does not work. But after 
loosing a few connections connecting input to hamming network, then 
the characters are classified correctly. For some character like Q, it is 
recognized correctly even after loosing connection of 46 input to all the 
neurons of Hamming network. The number of redundant connection for 
each of the character is shown in the table 4.3. 


Character 

Connection We Can Loose 

C 

26 

T 

34 

L,M 

35 

J,S,W 

37 

F,0 

38 

A,B,E,K,N,V 

39 

D,G,I,R,U,Z 

40 

h,p,x,y 

41 

Q 

46 


Table 4.3: Number of Redundant Connection for Each Character 
(v) Backpropagation Algorithm: There are three sets of weights. First 
connecting input to neurons of input layer. Second connecting input 
layer neuron to the hidden layer neurons. Third connecting hidden layer 
neurons to output layer neurons. If the weights connecting 15 input to 
35 neurons of input layer, 15 neurons of input layer to 15 neurons of 
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hidden layer and 8 neurons of hidden layer to all the 26 neurons of out 
put layer are removed, than also the network recognizes all the 
characters correctly. So total numbers of redundant connections are 
958. 

(vi) Quickprop: There are two sets of weights. First connecting input 
neurons to hidden neurons to out put neurons. If the weights connecting 
49 input neurons to 15 hidden neurons and weights connecting 15 
hidden neurons to 2 output neurons is equated to zero, then for 25 
characters actual output is equal to the target output. 

(vii) ARTl: If all the weights of the forward network are equated to zero, 
then it does not make any difference. All the 26 characters are clustered 
as before. 

(viii) Kohonen SOM: If all the weights connecting input to neurons are 
made zero, then it classified 26 characters in to 23 classes. So there is 
no effect of making weights zero. 

(ix) 4.2.5 Missing Information: Missing information means some of the 
on pixels in pattern grid are made off. For each of the algorithm how 
many information we can miss so that the characters can be recognized 
correctly varies from character to character. We cannot switch off pixel 
from any place. Which pixel is being switched also matters. In each 
algorithm, the way in which information is missed is shown in the 
appendix A. For few characters table 4.4 shows the number of pixels 
that can be switched off for all the stored characters in various 
algorithms. The comparison is done for Bidirectional associative 
memory, Hopfield auto associative memory, feature recognition neural 
network, hamming network and MAXNET, ARTl, and Kohonen 
network. From table 4.4 it is clear that ARTl correctly classifies after 
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loosing maximum number of information. For some characters it puts 
them in the same cluster as before missing information even after 
making 12 to 13 on pixels off. Hamming Net and MAXNET can also 
classify correctly after loosing a large number of information. 

4.2.6 Adding Information: Adding information means some of the off 
pixels in the pattern grid are made on. In this section, the classification or 
recognition ability of networks after adding information is studied. For every 
algorithm the number of pixels that can be made on varies from character to 
character. Which pixels are made on also matters? Appendix shows the 
pattern grid of the test vector given as input to different algorithms after 
adding information. Table no. 4.5 shows detailed description about the 
number of pixels that can be made on for all the characters that can be stored 
in various networks. The comparison is done for Bidirectional associative 
memory, Hopfield auto associative memory, feature recognition neural 
network, hamming network and MAXNET, ARTl, and Kohonen network. 
From the table 4.5, it is clear that FRNN correctly classifies after adding 
maximum number of information. For some characters it correctly classifies 
even if 3 1 -32 extra pixels are made on. 
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Table 4.4: Missing Information: No Of Pixels That Can Be Made Off In 

Different Algorithms 

































































M 




14 

7 

8 

N 



24 

15 

16 

10 

O 

Hi 


18 

5 

5 

9 

P 



29 

18 

4 

00 

Q 



20 

15 

4 

15 

R 




14 

1 

14 

S 



27 

16 

2 

13 

T 


9 


9 

4 

10 

U 



31 

19 

1 

17 

V 



31 

19 

2 

18 

w 



30 

14 

9 

14 

X 



11 

8 

4 

8 

Y 


10 

14 


4 

8 

Z 



32 

19 

2 

11 


Table 4.5: Adding Information: No Of Pixels That Can Be Made On In 

Different Algorithms 
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Chapter 5 Conclusions 


5. 1 General Conclusions 

A detailed analysis of 9 algorithms BAM, Hopfield, FRNN, Hamming 
network and MAXNET, Backpropagation, Quickprop, ARTl, Kohonen and 
Neocognitron has been performed for the pattern recognition (English 
Alphabets A-Z). The storage capacity of each algorithm is different. The 
near optimal network architecture of each of these algorithms was found, so 
that it can correctly classify or recognize the stored pattern. 

The Algorithms were compared on the basis of following criteria: 

1 . Total number of neurons needed 

2. Total number of unknowns to be computed 

3. Storage capacity 

4. Noise in weight 

(i) By adding random numbers to the weight 

(ii) By adding a constant number to all the weights 

5. Noise in input 

6. Missing information from the test vector 

7. Adding information to the test vector 

8. Number of redundant connections 
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BAM and Hopfield auto associative memory have capacity limitations. 
Both the network can store and correctly recognize only six characters, with 
the condition that the characters should not be slightly similar in shape. 
FRNN can recognize 23 characters correctly. It recognizes O and Q as C. It 
can not differentiate R from P. It performs well. But the number of neurons 
and the number of connection are much greater when compared to other 
algorithms. Its architecture is much simpler in comparison to Neocognitron 
for doing similar task. Neocognitron can be design to correctly recognize all 
the 26 characters. But it involves a large number of neurons and 
coimections. This network is veiy complex. A latest Neocognitron is 
developed which recognizes 35 patterns. It involves 70045 neurons. So the 
complexity can be imagined. Only 3 networks are found to correctly 
recognize or classify all the 26 characters. These 3 are , 

(i) Hamming network and MAXNET 

(ii) Backpropagation Algorithm 

(iii) Quickprop 

Among them in slandered Backpropagation and Quickprop algorithms, we 
have two suitable select the network architecture to perform the above task. 
The number of hidden neurons plays a great role in the performance of these 
two networks. Hamming network does not have this problem. However, the 
Hamming network retrieves only the class index and not the entire prototype 
vector. This network is not able to restore any of the pattern vector entries. 
When compared on the basis of number of neurons, it has been observed that 
Hamming network involves minimum number of neurons. It requires only 
52 neurons in comparison to Backpropagation, which needs 76 neurons and 
Quickprop that involves 91 neurons. FRNN requires 2371 neurons. But the 
total number of unknowns is least for quick prop. It requires 1200 unknowns 
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to be determined in comparison to Hamming network, which requires 1976, 
and Backpropagation, which requires 2216 unknowns. When noise is 
introduced in the test vector by adding random numbers, Hopfield auto 
associative memory and Hamming network recognizes all the stored 
characters correctly. So we can say that there is no effect on recognition 
ability of these networks by noisy inputs. BAM can correctly recognize 4 out 
of stored 6 characters. ARTl puts all the characters in the same class after 
introducing noise. FRNN does not work after adding random numbers to the 
test vector. 

ARTl performs best after missing information from the test vector. 
Hamming network also performs well after missing information. FRNN 
performs best after adding information in the test vector. It correctly 
classifies even after making on 31-32 off pixels. Kohonen network also 
performs well after adding a large number of information. 

The performance of 9 algorithms has been studied under six criteria. It 
has been observed that a certain algorithm performs best under a particular 
criterion. The algorithms have also been compared based on the number of 
neurons and the number of unknowns to be computed. The detailed 
description is shown in table 4.6. 

The performance of 9 algorithms has been studied under six criteria. It 
has been observed that a certain algorithm performs best under a particular 
criterion. The algorithms have also been compared based on the number of 
neurons and the number of unknowns to be computed. The detailed 
description is shown in table 4.6. 
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958 
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Table 4.6: Performance of Various Algorithms under Different 

Criterion 

In table 4.6 the used notation stands for: 

Algol: Bidirectional Associative Memory 
Algo2: Hopfield Auto Associative Memory 
Algo3: Feature Recognition Neural Network 
Algo4: Hamming Network and MAXNET 
Algo5: Backpropagation algorithm 
Algo6: Quick Prop 
Algo7: Adaptive Resonance Theory 1 
Algo8: Kohonen Self Organizing Map 
Algo9: Neocognitron 
N.A.: Not Applicable 
N.S.: Not studied 
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5.2 Recommendations for Future Work 

• One can look at the capability of enhancing the capacity of the 
networks for the pattern recognition. 

• The performance of various algorithms can be studied by varying 
style of presentation of patterns. Varying fonts can change the style. 

• One can also look at the performance of networks for handwritten 
patterns. 
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Appendix A 


Al. Character Presentation 
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3|e :fc ^ 3|t sfe 


O 




O Jfc 


J|t s|c 9fe 


^ O 
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A2; For Each Algorithm: Character Presentation after Missing 

Information 


A.2.1 Bidirectional Associative Memory 



o o o o o o o 
*000000 
0000000 
000000 
000000 

000000 

Q Q ^ sjc 




A.2.2 Hopfield Auto associative Memory 

O * o * o * o 

0 0 0 * 0 0 0 

O O 0 0000 

I ^ ^ ^ ^ ^ ^ 

0000000 
o o o * o o o 

O * O o * o 
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A.2.3 Feature Recognition Neural Network 

OOOOOOO O* 
0000*0000 
OOOO 0 0000 
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oooo ooooo 
0000*0000 
0000*000 0 
oo **o**oo 
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A.2.4 Hamming Network and MAXNET 
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A.2.5 Adaptive Resonance Theory 1 
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A.2.6 Kohonen SOM 




A:3 For Each Algorithm: Presentation of Characters After 

Adding Information 

A.3.1 Bidirectional Associative Memory 



o 




o 
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O O * ^ O 0 
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A.3.2 Hopfield Auto Associative Memory 
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A.3.3 Feature Recognition Neural Network 
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A.3.4 Hamming and MAXNET 
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A.3.5 Adaptive Resonance Theory 1 
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appendix B 


A.1: Representation of Input pattern in each algorithm 

The size of pattern grid is taken as 7x7 in 6 algorithms. In FRNN, back 
propagation and Neocognitron it is taken as 9x9, 7x5 and 19x19 
respectively. Table A. 1 shows a detail description about the size of pattern 
grids and the way in which on and off pixels are read in different algorithms. 


Algorithms 

Size of pattern 

Grid 

Off pixels 

On pixels 

Bam 

7x7 

-1 

1 

Hopfield Auto 

associative 

Memory 

7x7 

-1 

1 

FRJVN 

9x9 

-1 

1 

Hamming 

Network And 

MAXNET 

7x7 

-1 

1 

Backpropagation 

Algorithm 

7x5 

0 

1 

Quickprop 

7x7 

0 

1 

Artl 

7x7 

0 

1 
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Kohonen SOM 

7x7 

0 

1 

Neocognitron 

19x19 

0 

1 


Table A.l: For each algorithm: Size of pattern grid and the way in 
which an on and off pixel read as 
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